\documentclass{article}
\usepackage{listings}

\begin{document}

\lstset{language=Lisp}

\title{Notes on the Common Lisp type system}

Lisp includes several different kinds of types and type-like things, which are not always immediately cohesive.



\section{Primitive classes}

The oldest form of types in Lisp is what I'm going to call ``primitive classes''. Because Lisp is ``dynamically typed'', actual runtime objects usually need some kind of type information associated with them at runtime; this information is its ``primitive class''.

There may be a hierarchy of primitive classes. For example, it's common to have a few kinds of things as ``immediates''. Perhaps a fixnum is a machine word with its least significant bit zero, and everything else is a machine word with its least significant bit one, representing a pointer to an object elsewhere; these objects elsewhere consist of a header word followed by some primitive-class-dependent data. Then the top level primitive distinction is fixnum vs. non-fixnum, and within non-fixnums there is a menagerie of other primitive classes distinguished by their headers.

There may also be ``unboxed'' objects - data that is similar to a Lisp object in some sense, but which does not have a primitive class, used only among operations that don't use this information. For example, the other non-fixnum primitive classes above could have their primitive-class-dependent data as their ``unboxed'' form. Boxing and unboxing allow some parts of code to operate ``raw'' without requiring safety to be sacrificed. A function that performs lengthy and complex numerical operations with a single-float argument might first check that its argument is a single-float, extract the unboxed single-float, perform machine-level operations with it, and then box the result for returning back to the Lisp world.

Primitive classes are important because they represent what the runtime knows about ``directly''. The runtime can discriminate among objects by their primitive class almost trivially.

The standard does not mandate anything at all about primitive classes, as they are part of the runtime. The organization of the system implies that some things could stand to be primitive classes for performance reasons, but there are no requirements.

However, one important operator is tied to runtime object representation: \texttt{eq}. \texttt{eq} compares two objects and returns true if they are ``the same, identical object''. This phrase is not defined in more detail, but is usually considered to mean that they should be the same primitive object. (The notes for \texttt{eql} do say that \texttt{eq} is true for ``implementationally identical'' objects.) For compound objects like conses, symbols, or standard-objects, this also means they are the result of the same call to \texttt{cons}, \texttt{make-symbol} or \texttt{intern}, or \texttt{allocate-instance}. Numbers and characters, however, are explicitly allowed to be ``copied'' arbitrarily.

In Lisp terms, this means that \texttt{eq} on numbers and characters that are the same under \texttt{eql} has undefined results. In runtime terms, this allows an implementation to box and unbox numbers and characters (and only those) at will. A function that adds one to a number, then subtracts one again, can return a new primitive object (for bignums, for example), with the same unboxed portion, without running afoul of the standard. A function that adds the \texttt{car} and \texttt{cdr} of a cons and puts the result back in the \texttt{car}, however, cannot unbox the cons into two objects, add them, and box it back into a new cons. A common use of this freedom is in arrays. If a number is stored into an array, the number later extracted from the array does not have to be identical to the original number; therefore arrays can store unboxed numbers. A cons or struct object \emph{does} have to be identical before and after, so arrays cannot store these unboxed; they must store full boxed objects, even if those objects happen to all have the same primitive class.



\section{Classes}

Lisp objects have a class. Unlike primitive classes, classes are accessible and usable by the programmer. Each object has one class that it is a ``direct instance'' of; this is retrievable with \texttt{class-of}. Whether an object is a (possibly indirect) instance of a class can be determined with \texttt{typep}. Every class has a ``class precedence list'', a linear sequence of superclasses; these relations can be queried with \texttt{subtypep}.

The class of an object does not necessarily relate directly to its primitive class. \texttt{null} is a class, consisting of one object, \texttt{nil}. But \texttt{nil} is just a symbol, and its primitive class may just be the same as that of any other symbol. The fact that \texttt{null} is a subclass of \texttt{list} has no bearing on the primitive class relationships, and can be implemented ``artificially'' in Lisp by just specifying \texttt{null}'s class precedence list independently of the primitive class hierarchy. Another possibly common area for this would be vectors: there is only some reason to make \texttt{vector} a primitive class, and almost none for making \texttt{string} one, although they are both Lisp classes.

Some classes may be redefined. This can change their class precedence lists, and those of their subclasses, but it does not change \texttt{eq}ness of the class or of instances of the class.

The class of some objects can be changed, while preserving \texttt{eq} behavior. This might be implemented as having a primitive class for ``instances'', which have their (Lisp) classes stored in them, but there is no requirement.


\section{Types}

Types are - usually - sets of objects. Types cannot be accessed or manipulated directly; they are only referred to by names called ``type specifiers''.

Type specifiers are symbols, conses, or classes. When used as a type specifier, a class designates the set of all instances of that class - which can be time-dependent, because of \texttt{change-class}. Symbol and cons type specifiers may designate the type of a class, or types defined by the standard, or user-defined types.

Like forms, type specifiers are technically meaningless without an environment. Most functions that accept a type specifier, such as \texttt{typep}, take an environment argument for this reason. \texttt{subtypep} takes two specifiers but only one environment, meaning it cannot be used to compare specifiers with different environments.

The macro \texttt{deftype} forms a macro facility for type specifiers. All type specifiers defined by \texttt{deftype} can eventually be macroexpanded into types defined by other means, so there is not much more to say about it.

The kinds of types that can be built up by compound type specifiers form a naive set theory:

\begin{itemize}
\item Explicitly defined finite sets, via \texttt{eql} and \texttt{member}.
\item Unions and intersections of other types, via \texttt{or} and \texttt{and} respectively.
\item The absolute complement of another type, via \texttt{not}.
\item Unrestricted comprehension via \texttt{satisfies}.
\end{itemize}

This last allows any possible Lisp-computable set of objects to be referred to.

Other types that may be built up are numeric ranges, sets of characters (more conveniently than \texttt{member}), cons types, and array and complex types - more on that below.

A few standard type specifiers specify kinds of sets that cannot be specified by programmers. \texttt{keyword} is the type of symbols in the ``KEYWORD'' package; specifiers for symbols in other packages are not available. \texttt{string} is the union of all array types whose actual element type is a subtype of \texttt{character}.

Some type specifiers do not exactly refer to types, despite the name. These are: the cons form of \texttt{function}, \texttt{values}, and in some circumstances, array types. They are called ``type specifiers'' because they are used in similar circumstances.



\section{How are types used?}

There are several parts of the language that deal with types. These uses are different enough that they make it difficult to talk about what types are actually for.


\subsection{Type specifiers as predicates}

First, types can be used with \texttt{typep}, and less directly with the \texttt{typecase} family of macros. In this context, type specifiers form a domain specific language for building arbitrary predicates. Other functions can be brought in using \texttt{satisfies}, and combined with \texttt{and} etc. Special sets can be referred to with \texttt{member}. \texttt{typecase}, like all multi-conditional operators in the language, is explicitly operational, performing the first test first, the second test second, etc.

Predicates built up in this way can be quite complex, and require operations not based on classes. For example, the type specifier \texttt{(or keyword (vector bit (7)))} (in a standard environment) defines a predicate \texttt{(lambda (x) (or (and (symbolp x) (eq (symbol-package x) (find-package 'keyword))) (and (bit-vector-p x) (eql (length x) 7))))}. Of all this only \texttt{symbolp} and \texttt{bit-vector-p} are predicates for class membership. It is likely that these classes match or almost match primitive classes.

Type specifiers form a very powerful language for predicates, but parts of the type system are not usable for this. Some type specifiers - \texttt{values} and compound \texttt{function} - are not valid arguments to \texttt{typep}. Many predicates of interest, or sets, cannot be expressed without \texttt{satisfies} (which is of course begging the question), such as proper lists, arrays with elements of a given type, or symbols of a package other than \texttt{keyword}. \texttt{and} and \texttt{or} do not mandate an order of evaluation, sometimes surprising users; e.g. if \texttt{(defun cl-symbol-p (sym) (eq (symbol-package sym (find-package ``CL''))))}, the type specifier \texttt{(and symbol (satisfies cl-symbol-p))} could potentially call the function before checking that the object is a symbol.


\subsection{Type specifiers for controlling object creation}

A few operators are passed type specifiers to control what kind of objects they create. These are \texttt{merge}, \texttt{concatenate}, \texttt{make-sequence}, \texttt{map}, and \texttt{coerce}. All of these functions may create new objects; the last any objects, but the rest only sequences. None of them take an environment argument, presumably meaning that their specifiers are meaningful in the runtime environment.

All of them create different objects based on the specifier passed. \texttt{(coerce '(lambda (x) x) 'function)} makes a function, while \texttt{(coerce '(lambda (x) x) 'cons)} returns the lambda expression as it is already a cons.

The descriptions for all of these functions only mention subtype information in connection with the \texttt{list} and \texttt{vector} types. If the specifier argument is a subtype of list, they make a list. They try to determine element types and lengths for vector type specifiers; the latter is only used for assertion. This phrasing in terms of subtypes appears to be because of the variable and unspecified number of array specializations; it seems unlikely that the possibility of provided sequence types like \texttt{nil} (below) or \texttt{(cons (eql x) null)} were considered in depth.

\texttt{coerce} can return an object not of the type the specifier it's passed specifies, for \texttt{complex} specifiers. For non-sequence types, \texttt{coerce}'s ``type specifiers'' are instead specified as particular individual symbols, though with the possibility of extension by implementations. For example an implementation could probably be considered technically okay if it signaled an error when told to coerce something to type \texttt{(float 1.0 2.0)}.

The type specifier \texttt{string} has a special meaning when used with these functions, distinct from its meaning elsewhere: \texttt{(vector character)}.

The type specified by \texttt{nil} is a subtype of both \texttt{list} and \texttt{vector}, but what should be done if the provided specifier is \texttt{nil} is not specified for any of these functions, except for \texttt{map}, which does not treat it as a specifier and uses a different behavior entirely.

When they return sequences, all of these functions are required to try to extract a length from the type and signal a \texttt{type-error} if this doesn't match the length of the returned object, in safe code.

Ignoring this length check, all of these functions could have been written to take a fixed list/vector discriminator and an optional array element type instead of dealing with subtypes, though that would make it impossible to write \texttt{(concatenate 'string ...)} in favor of maybe \texttt{(concatenate-to-vector 'character ...)}.


\subsection{Type specifiers for storage specification}

As described before, it's possible to have arrays that store unboxed numbers or characters. Such specialized arrays must store, at runtime, some reference to the primitive class of the objects they store, so that boxed objects can be properly created from the array data. The reference may be indirect, or implicit, such as having a different primitive array class for each possibility. It is also possible that multiple distinct kinds of arrays storing unboxed data for the same primitive class could exist; for instance one kind of array might store bits, and another eight-bit integers, and these numbers would both be boxed with a primitive class for fixnums.

Specialized arrays can be created from Lisp by passing \texttt{make-array} an \texttt{:element-type} argument. This argument can be an arbitrary type specifier, which is obviously not really suitable for the primitive level. Therefore, types are ``upgraded'' into supertypes that the implementation knows how to make an array for. The upgraded specifier for an array can be obtained with \texttt{array-element-type}, and the upgraded specifier of a given specifier can be obtained with \texttt{upgraded-array-element-type}.

Similar considerations apply in relation to \texttt{upgraded-complex-part-type}, but there are fewer accessors available with that system. Even more implicitly, an implementation may ``upgrade'' type specifiers used in \texttt{defstruct} so that structure objects may use unboxed storage.


\subsection{Type specifiers in proclamations}

Types can be used to provide information to the compiler - expressing that a variable only ever takes on values from a given set. There are two ways to do this, declarations and \texttt{the}, but the former is defined quite clearly in terms of the latter.

How does the compiler use single value type specifiers? They can be ignored, used only for user documentation. They can be turned into assertions. They can be used for optimization.


\subsubsection{Optimization}

A compiler can use type information primarily to eliminate runtime type discrimination. Since \texttt{typep} is what discriminates, and it can check membership in any possible set, as discussed, this is hypothetically a very general process.

Probably some of the most important checks are for membership in primitive classes.

The compiler can in general use type information that is not explicitly predicatable. Values ``types'' can be inferred easily and safely; e.g., if \texttt{form1} is known to evaluate to an integer, and \texttt{form2} is known to evaluate to a string, \texttt{(values form1 form2)} is known to be exactly two values, an integer and a string. Functions with known definition can have their return values type inferred as well.

More exotic types not in the standard could be inferred as well, such as compound structures (lists, arrays, etc.) with uniformly typed elements. One important possibility used by SBCL is ``types'' that compute the result of a function call based on arguments to the function. For example, \texttt{mod} can restrict its result type to something obviously useful like \texttt{fixnum}, depending on its second argument.


\subsubsection{The nature of type proclamations}

Compound \texttt{function} specifiers can be used in \texttt{type} or \texttt{ftype} declarations (and as far as I can tell, \emph{only} in these declarations and nowhere else in the language), in which case they assert type declarations for the arguments to and return values from calls to the function, rather than anything about the function itself per se. It would be fine to do \texttt{(defun vref (array index) (aref array index))} and then, in some function body, have \texttt{(declare (ftype (function ((simple-vector (unsigned-byte 37) 7)) bit) vref))} to declare that all vectors passed to \texttt{vref} are specialized to that type but only contain bits, even though the function itself has broader behavior. (I have seen no code actually do this, but it is quite explicitly allowed by the standard.)

\texttt{type} declarations are explicitly equivalent to wrapping references and modifications of a variable in a given scope in \texttt{the} operators. One notable part of this is that one of these wrapped operators can be done ``at the moment the scope of the declaration is entered''. Almost all of the time, this will dominate all other references to a variable in that scope, meaning only \texttt{setq} has to imply additional assertions; the only possible exception is for special and closed-over variables that could be modified from other threads.

\texttt{the} presents a problem with multiple values. The type for \texttt{the} is either a values type or implicitly considered one: \texttt{(the x y)} is equivalent to \texttt{(the (values x) y)} for any non-values type specifier \texttt{x}. \texttt{the} is defined as follows: all values from its form are returned, and the consequences are undefined if, for any n, the nth type specified (if it is specified) does not match the nth value returned, or \texttt{nil} if there aren't enough values. Types specified in an \texttt{\&optional} part of the \texttt{values} specifier may have their values omitted; the type specified a \texttt{\&rest} part must match all remaining values.

So for example, the type specifier \texttt{(values)} does not specify anything, and especially does not specify that no values are returned. \texttt{(values \&rest t)} specifies that all values are of type \texttt{t}, which is equivalent to the previous specifier. \texttt{integer} is the same as \texttt{(values integer)} and specifies that at least one value is returned, and that the first value is an integer. \texttt{list} is the same as \texttt{(values list)} and specifies that either one or more values are returned and the first value is a list, or that no values are returned, because the default \texttt{nil} is of type \texttt{list}. \texttt{(values \&optional list)} is equivalent to that, but \texttt{(values \&optional integer)} is not equivalent to \texttt{(values integer)}, as it can validly correspond to no values. \texttt{(values \&rest random-state)} corresponds to any number of values, but they must all be random states.

This description of \texttt{the} directly contradicts the description of \texttt{values} specifiers, which states that they specify a lambda list for a function that could correctly receive the values if \texttt{multiple-value-call}ed. SBCL has interpreted this to mean that if a \texttt{values} specifier has lambda list keywords, this latter description should take precedence, so that the number of values can be specified much more strictly; this results in a propensity for type specifiers of the form \texttt{(values x \&optional)}, which are considered to indicate different behavior than \texttt{(values x)}.

Another possibility that at least allows a maximum number of values to be specified is \texttt{\&rest nil}, indicating that all remaining values must be of type \texttt{nil}, i.e. not exist. There is still no way to specify for example that there must be at least one value, which is of type \texttt{list}.


\subsubsection{Treating type proclamations as assertions}

In safe code me way want to treat all type declarations (treating type specifiers as in ``for declarations'') as assertions (``as predicates''). In that case, we could define a macro for \texttt{the} as follows, where for convenience the values type is replaced with a list of required types, a list of optional types, and a rest type:

\begin{lstlisting}
(defmacro checked-the (required optional rest form)
  (flet ((gen-check (form type)
           `(unless (typep ,form ',type)
              (error 'type-error :datum ,form :expected-type ',type))))
    (let ((nreq (length required))
          (nopt (length optional)))
      `(let ((vals (multiple-value-list ,form)))
         ,@(loop for req in required
                 for i from 0
                 collect (gen-check `(nth ,i vals) req))
         ,@(loop for opt in optional
                 for i from nreq
                 collect `(let ((o (nth ,i vals)))
                            (when o ,(gen-check 'o opt))))
         (mapc (lambda (x) ,(gen-check 'x rest))
               (nthcdr ,(+ nreq nopt) vals))
         (values-list vals)))))
\end{lstlisting}

For example, \texttt{(checked-the (integer cons) (random-state) integer ...)} expands into

\begin{lstlisting}
(let ((vals (multiple-value-list ...)))
  (unless (typep (nth 0 vals) 'integer)
    (error 'type-error :datum (nth 0 vals) :expected-type 'integer))
  (unless (typep (nth 1 vals) 'cons)
    (error 'type-error :datum (nth 1 vals) :expected-type 'integer))
  (let ((o (nth 2 vals)))
    (when o
      (unless (typep o 'random-state)
        (error 'type-error :datum o :expected-type 'random-state))))
  (mapc
   (lambda (x)
     (unless (typep x 'integer)
       (error 'type-error :datum x :expected-type 'integer)))
   (nthcdr 3 vals))
  (values-list vals))
\end{lstlisting}

This is functional, but would be very inefficient if used for all \texttt{the} forms and type declarations, because it saves values in a list, and potentially iterates over it. This, or something similarly involving explicit storage, is the only way in Common Lisp to do something with an unknown number of values and then return those values unchanged.

In almost all cases, a \texttt{values} specifier is not used, and therefore there will be one required type, no optional types, and a rest type of \texttt{t}. This does not solve the problem: any additional values returned by the form must still be returned, and the number of values returned by a form cannot in general be determined statically (and may be actually variable).

We can twist things the other direction and check based on the number of values used by the context of a form. Argument sto function calls, values for \texttt{let} bindings, and most other contexts only use the primary value of forms, and therefore only the primary value needs to be checked. This requires type check insertion to be delayed past source processing, but is workable.

Very few places in the language are actually strict about the number of values returned - \texttt{multiple-value-call} and the \texttt{:no-error} clause of \texttt{handler-case} are all that I am aware of. The forgiveness offered by the language can however make some optimizations difficult. An additional consequence is that there are almost no contexts (as above) in which a fixed number of values more than one is required. The form in \texttt{multiple-value-bind}, for instance, can validly return any number of values. This is probably the most common usage of multiple values by programmers, and it is quite difficult to enforce types enough to allow vectored returns and so on.

\end{document}
